Improved GWO and its application in parameter optimization of Elman neural network

Traditional neural networks used gradient descent methods to train the network structure, which cannot handle complex optimization problems. We proposed an improved grey wolf optimizer (SGWO) to explore a better network structure. GWO was improved by using circle population initialization, information interaction mechanism and adaptive position update to enhance the search performance of the algorithm. SGWO was applied to optimize Elman network structure, and a new prediction method (SGWO-Elman) was proposed. The convergence of SGWO was analyzed by mathematical theory, and the optimization ability of SGWO and the prediction performance of SGWO-Elman were examined using comparative experiments. The results show: (1) the global convergence probability of SGWO was 1, and its process was a finite homogeneous Markov chain with an absorption state; (2) SGWO not only has better optimization performance when solving complex functions of different dimensions, but also when applied to Elman for parameter optimization, SGWO can significantly optimize the network structure and SGWO-Elman has accurate prediction performance.


Introduction
Elman neural network is a typical local regression network [1]. It has been widely used in the fields of image recognition, fault detection, and big data prediction because of its strong memory capacity and high computational efficiency [2]. The performance of Elman is largely influenced by its training process. Therefore, exploring a high-quality training process has become a key problem to solve in neural network research [3].
In the early 1990s, gradient descent and stochastic methods were the two main Elman training methods [4]. However, gradient descent methods have three main drawbacks [5]: difficulty in finding the global optimal solution, slow convergence, and high dependence on the initial parameters. Similarly, stochastic methods can also weaken the training ability by initializing the parameters. As a result, in the late 1990s, some studies constructed a neural network as a nonlinear optimization model to replace the original linear model [6]. Although this approach avoids computing gradient information, it is not applicable when the dimension exceeds the memory range. Accordingly, starting in 2000, some researchers considered the training network structure as an optimization problem of finding the optimal parameters in a finite space [7]. Some scholars solved this optimization problem by heuristic methods [8]. However, the method needed to increase search space when traversing the set of parameters, which improved the time complexity of the algorithm [9]. To explore better network structures and improve the performance of neural networks, metaheuristic algorithms have become reliable alternatives [10]. Compared to gradient descent methods, metaheuristic algorithms show higher efficiency in avoiding local extremum. These algorithms shift from local search to global search, making them more suitable for global optimization. Therefore, researchers have used metaheuristic algorithms in Elman as an optimization strategy for network structures, and a series of more meaningful results have been achieved so far. For example, Zhang et al. used an improved arithmetic optimizer (IAO) to train the Elman network structure [11]; For the soil salinity prediction problem, the sine cosine algorithm (SCA) was applied to adjusting the parameters of Elman [12], and the experimental results demonstrated that SCA could improve the prediction efficiency of Elman; Some researchers used the particle swarm optimization (PSO) algorithm to optimize Elman parameters and PSO-Elman based on load prediction model [13], compaction density evaluation model [14] and parameter evaluation model were constructed [15]; Metaheuristic algorithms were combined for adjusting the weights and thresholds of Elman. For example, the ant colony algorithm (ACO) and genetic algorithm (GA) were combined to form AGA-Elman [16]; SUN et al. developed an Elman prediction model based on a whale optimization algorithm (WOA) [17]. The experimental results proved that WOA-Elman has good engineering utility the porosity prediction. In addition, WOA-Elman also played an important role in weather prediction [18] and landslide probability prediction [19].
Although various metaheuristic algorithms have been deployed and studied to train Elman, local extremum still exists. The grey wolf optimizer [20] (GWO) is a recently proposed metaheuristic algorithm. GWO is inspired by the wolves hierarchy and the hunting process. GWO has three leaders who are responsible for guiding the wolves to attack, delivering attack information and leading the pack to encircle [21]. During the iterative, the three wolves continuously update their positions and thus search for the global optimum. Due to its few parameters, easy implementation and strong convergence, GWO has shown excellent performance in solving high-dimensional optimization problems [22]. However, the global search capability of GWO is still poor, and it is easy to fall into local extremes. However, the wellknown No Free Lunch Theorem [23] states that there is no universal metaheuristic algorithm that can solve all optimization problems. Therefore, our research aims to focus on two points. First, to propose a more efficient improved grey wolf optimizer based on the algorithm characteristics. Second, to explore a better method for training network structures based on the improved grey wolf optimizer. Therefore, we propose an Elman training method based on the improved grey wolf optimizer (SGWO). SGWO introduces three strategies into the wolf hunting process: circle chaotic mapping, information interaction mechanism and the adaptive position update strategy. We use circle chaotic mapping to increase the population diversity; In the information interaction machine, the head wolf position is perturbed by the Cauchy variation to jump out of the local optimum, and the information transfer between wolves is enhanced by the golden sine algorithm, thus accelerating the convergence of SGWO; Meanwhile, the adaptive position update strategy is used to adjust the search range autonomously, enabling SGWO to balance the global and local searches. In addition, we innovatively introduce the Markov process and probabilistic analysis to demonstrate the convergence performance of SGWO. Ablation experiments based on three strategies are also conducted and SGWO is compared with seven optimization algorithms to analyze the optimization performance of the improved grey wolf optimizer. Based on this, we incorporate SGWO into the Elman training process and construct an SGWO-Elman prediction model. The SGWO-Elman is also compared with three types of algorithms, including Elman neural network based on other optimization algorithms, other neural networks and other neural networks based on SGWO to verify the prediction ability of SGWO-Elman model for complex problems.
The rest of the paper is organized as follows. Metaheuristic algorithms classification and variants of GWO are mentioned in Section 2. Section 3 gives a brief description of the grey wolf optimizer. The improved grey wolf optimizer (SGWO) is introduced and proved in Section 4. Section 5 proposes and describes an Elman training method based on SGWO. Experiments and results are discussed in Section 6. Finally, we conclude with a summary of the current work and future research efforts.

Related work
Compared with traditional optimization algorithms, optimization techniques that mimic natural phenomena have dominated the field of optimization. These are also known as metaheuristic algorithms. Metaheuristic algorithms are mainly divided into three categories: evolutionary algorithms (EA), physics-based algorithms, and swarm intelligence (SI) based algorithms [24].
EA mimics the rules of nature evolves. The genetic algorithm (GA) [25] is very popular in EA. In GA, the initial solution is randomly generated and continuously updated through crossover and mutation operations. GA will find the optimal solution by iteration finally. Under the evolution of GA algorithms, many studies have proposed new algorithms, such as differential evolution (DE) [26], covariance matrix adaptation evolution strategy (CMAES) [27], evolutionary programming (EP) [28], etc.
Physics based algorithms are inspired by the physical world, such as gravity, explosions, and so on. Among them, gravitational local search (GLS) [29], multi-verse optimization algorithm (MVO) [30], sine cosine optimization algorithm (SCA) [12], and atom search optimization algorithm (ASO) [31] are classic physics based on algorithms. In GLS, the searched individuals are viewed as objects moving in space, attracting each other through gravitational interaction. Gravity forces individuals to move towards the individual with the greatest mass, gradually approaching the optimal solution. SI is inspired by the collective behavior and nature rules of bees or herds. SI includes mothflame optimization algorithm (MFO) [32,33], white shark optimizer (WSO) [34], whale optimization algorithm (WOA) [17], sparrow search optimization algorithm (SSA) [35], and others. In SI, the particle swarm optimization (PSO) [13] is the most popular algorithm, which updates the location of birds to find the most food.
Grey Wolf Optimizer [20] is a recently proposed metaheuristic algorithm. GWO is widely used to solve optimization problems due to its advantages such as fewer parameters and fast convergence speed. However, GWO still has poor global search ability and is easy to fall into local extremes. Recently, there have been many studies to improve the GWO algorithm in different ways. Some studies have proposed population diversity strategies to balance initial population distribution. Some works have focused on adjusting the parameters of GWO, i.e., A and C. The other works have adjusted the location update strategies to improve GWO performance. Another aspect of related studies to this work was combining GWO algorithm with other existing metaheuristic algorithms. Although SGWO algorithm is fundamentally different from previous methods, we still need to discuss the classification of metaheuristic algorithms in detail.
Modifications the random position of the initial population can balance spatial distribution of the population. Chaotic mapping strategy and opposition learning strategy were widely used in initial population. In the chaotic mapping strategy, Luo et al. [21] have proposed tentline coupled chaotic mapping to initialize the population, which ensured that the GWO algorithm generated diverse populations; Another improved GWO algorithm used a two-dimensional chaotic map to initialize the population [22]; Zhao et al. have generated GWO initial population through Chebyshev chaotic mapping, ensuring the diversity of the initial population and enhancing global search ability of GWO [36]; In addition, some studies have integrated chaotic maps; Xu et al. have applied integrated mapping systems (CLS) to GWO to increase its population diversity and accelerate the convergence of the algorithm [37]. Besides chaotic mapping, the pseudo-antithesis number generation method based on opposition learning strategy was used to improve the distribution of population [38]; Another improved GWO also generate its opposition wolf by lens imaging learning strategy [39]. These population diversity strategies are successful in balancing initial population distribution and improving algorithm's performance.
Some algorithms have improved GWO performance by modifying and adjusting parameters. Song et al. [40] proposed IGWO, which enhanced exploration by modifying linear convergence factor to nonlinear; The improved grey wolf optimizer also adjusted a nonlinear parameter of GWO based on polynomials [41], and showed accurate measurement results in the optimization of seepage parameters; However, these nonlinear strategies have only succeeded in improving the performance of GWO in some aspects. For example, improved GWO [42] was beneficial to improve the convergence performance of unimodal functions, but has a poor effect on multimodal functions. Besides parameter update equations, fuzzy method [43] was used for the adaptive adjustment of the control parameters. The exploration-enhanced grey wolf optimizer (IEE-GWO) [44] used a nonlinear control parameter strategy, which has been proven that IEE-GWO has a fast convergence rate when solving unimodal functions. There are many excellent parameter adjustment strategies to improve GWO, but this method makes the algorithm perform well only on specific problems. Some improved GWO introduced the location update strategy, making GWO suitable for a variety of optimization problems. A new search strategy named dimension learning-based hunting (DLH) [45] was introduced in IGWO, which inherited from the individual hunting behavior of wolves and shared neighboring information; An improved GWO variant used two strategies, neighbor gaze cue learning (NGCL) and random gaze cue learning [46]. These two strategies can update the location of wolves and achieve a balance between exploration and exploitation; Besides, multi-stage grey wolf optimizer (MGWO) [47] can update wolves at three stages and maintain convergence speed.
In fact, some other variants hybridize GWO with other search strategies or metaheuristic algorithms to improve its performance. Then a hybrid of genetic algorithm (GA) and GWO were combined to reduce the dimension of the obtained feature vector [48]. In another similar work, a novel improved GWO called collaboration-based hybrid GWO-SCA optimizer was developed [49]. Experimental results indicated that it was a high-performing algorithm in global optimization. With the same goal, a recently developed metaheuristic optimization algorithm called hybrid PSO-GWO [50] has been proposed to improve exploitation and exploration ability.

Grey wolf optimizer
mechanisms and mathematical models. In hunting mechanisms, GWO simulates uniquely the predation behavior according to the hierarchy of nature. The grey wolves are divided into four grades, including alpha (α), beta (β), delta (δ) and omega (ω). In groups, each of level grey wolves has a different responsibility. As a leader, α wolf has a powerful effect on the group and determines the hunting direction of the wolves; β wolf is in the second level of wolves, which helps α wolf in decision-making and dictates instructions to wolves in the lower hierarchy; δ wolf considered in third level of hierarchy, which can be following the arrangement in α and β; ω is at the bottom of the hierarchy. GWO hunting is abstracted as searching for optimal values. Specifically, it can be described as the following mathematical model.

Mathematical model for encircling the prey
The first process of hunting is encircling the prey. Eq (2) updates the position of grey wolf by calculating the distance between the grey wolf and the prey.
where X p denotes the prey position, X(t) refers to a grey wolf position, X(t+1) represents the location of a grey wolf in the next iteration, D represents the distance between the grey wolf and its prey. C is the oscillation factor, A is the convergence factor. When |A|>1, wolves will conduct a large-scale search on the global scope. When |A|<1, wolves will conduct a fine search for local areas. It can be expressed by the following formula: where r 1 ,r 2 2[0,1] is the random variable, a represents the distance control parameter that decreases linearly from 2 to 0, t is the current number of iterations and T is the maximum number of iterations.

Mathematical model for hunting mechanism
When the grey wolf tracks the prey's position, α wolf will lead β wolf and δ wolf to surround the prey in nature. However, in a simulated search space we do not know the prey location. In order to build the hunting model, the optimal, sub-optimal, and third-optimal solutions are used as α, β and δ wolf positions. We suppose that three solutions guide other wolves to attack the prey. The position of the first three wolves will change.
8 > < > : where X α , X β , X δ represent the current position of α, β and δ. D α , D β , D δ represent the distance between the three wolves and the prey. X 1 , X 2 , X 3 represent the updated position of α, β and δ wolf. A 1 , A 2 , A 3 are defined in Eq (3), which represent respectively the convergence factor of α, β and δ. At this time, three wolves are the closest prey in the wolves. Therefore, individual positions are updated according to α, β and δ wolf position: The wolves continuously search for the optimal solution according to the above process. After hunting, determine X α is the location of the prey.
Compared to other population-based optimization algorithms, the grey wolf optimizer has some advantages. For example, the grey wolf optimizer has a simple structure with few parameters; Grey wolf optimizer can find the optimal results quickly due to its unique hierarchy; In addition, the low time complexity of the grey wolf optimizer allows it to play an important role in practical optimization problems. However, there are still some disadvantages. For example, the grey wolf optimizer is prone to fall into local extremes. Therefore, proposing an effective improved grey wolf optimizer is one of our research objectives.

Improved grey wolf optimizer (SGWO)
To improve the optimization performance of the GWO algorithm, we proposed SGWO based on the adaptive information interaction mechanism. The SGWO algorithm was described in terms of implementation method and algorithm steps.

Circle population initialization (cGWO)
In GWO, the optimal value was greatly constrained by the initial position. Compared with a random search, the map was widely applied to generate the initial population because of its randomness. However, different chaotic maps have different effects. To find the optimal value quickly, we analyzed and compared Sobol, Logistic, Iterative, and Circle maps [21,22]   are all uniform, and the grey wolf group using Tent mapping is more evenly distributed in space than other maps. However, some individuals on the map are at the boundary, which will affect the overall efficiency of the algorithm. Compared with the four mappings, the circle map has more boundary individuals. To enhance the algorithm to deal with extreme value problems and consider experimental results, the paper still chose a circle map finally. The circle map [51] model is as follows: where X t represents the population individuals at the t-th iteration. The circle map is used only once in the initialization step to generate an initial population [20]. In the iteration, GWO only uses this initial population once for position updates. The circle map can balance population distribution and reverse inhibition. When the algorithm falls into a local extreme, a uniform population distribution can help wolves move to the next location. Therefore, population initialization plays a role in improving the exploration ability of SGWO.

Information interaction mechanism (iGWO)
In the information interaction mechanism, the hunting process was simulated as the information interaction process among wolves. Where, the hunting path as the channel, α position as the source point, β position as the transmission station, and subordinate wolves as the signal receiving point. Cauchy variation was used to change the position of source point. Golden Sine algorithm has optimized the information transmission process, and enhanced information exchange between wolves. Mathematically, the information interaction mechanism can be constructed in two steps. Every step can be explained as follows.
(1) disturbing source point In GWO, α wolf position belongs to the source point, which determines the attack direction for wolves. If the leader's position deviates, it will prolong the search time and reduce the

PLOS ONE
Improved GWO and its application search accuracy. Thus, the Cauchy variant [52] with excellent local exploration ability was used to optimize the head wolf. The α wolf can jump out of the local extreme value, and avoid premature convergence. The standard Cauchy distribution function is as follows.
The standard Cauchy distribution function is delayed from a flat peak to both ends. A longer trailing tail can increase the perturbation probability and make the head wolf jump out of the local extremum quickly; a flat peak can reduce its search time in the adjacent area and enhance the ability to search for the global optimal solution. The standard Cauchy operator was used to randomly disturb the α wolf's position. The position update formula for α wolf is as follows: X 1 is defined in Eq (7), which represents the final positions under the leadership of α in GWO. And X 1 is calculated according to X α . In Eq (11), X 1 is used as the initial position of α. X 0 1 is the new location of α wolf, which represents final position of α in SGWO. A Cauchy variant is helpful for α wolf to pass the best hunting position to wolves. Wolves can quickly close to the prey, to speed up the search speed.
(2) optimize information transmission process In GWO, β wolf location belongs to the transmission station in the communication channel. However, the suboptimal value cannot determine the distance from β wolf to α wolf and δ wolf. Therefore, the information will be biased when β wolf transmits α wolf position to the subordinate wolves. When the algorithm is solving highly complex optimization problems, it is difficult to fully explore the solution space, which affects the search accuracy.
The golden sine algorithm (Golden-SA) [53] is a new meta-heuristic optimization algorithm. All points on the sine function are scanned by the unit circle and solution space is fully traversed. Thus, the optimal solution will be searched in Golden-SA. Updating the solution process is the core of the Golden Sine algorithm.
where X t i refers to a current individual position. P t i refers to a current optimal position. R 1 is [0,2π] random variable and R 2 is [0,π] random variable. They control the distance and direction of movement respectively. The golden ratio τ is ð ffi ffi ffi 5 p À 1Þ=2. x 1 and x 2 is obtained by τ, these two coefficients narrowing the space by spiral search and keep approaching towards the optimal solution.
Inspired by the golden section, the golden sine algorithm was incorporated into the GWO algorithm to change the movement of β wolf. The position update formula of the β wolf is as follows:

PLOS ONE
where D 0 b represents the new distance between the β wolf and the prey. X 0 2 represents the new position for β wolf in SGWO. X(t) is defined in Eq (1), which represents to a grey wolf position. R 1 and R 2 are defined in Eq (12), which represent random variables in [0,2π] and [0,π]. Eq (15) is updated based on Eq (7), which A 2 still represents the convergence factor of β.
An analysis based on Fig 2 and Eq (15) shows that: R 1 , R 2 can constantly adjust the moving direction and moving distance of β wolf, so that β can fully understand the information difference between α and δ wolf. More specifically, β wolf is ensured at the golden division between α and δ wolf (as in Fig 2A). This method enhances information exchange in GWO. In addition, SGWO can scan all points on the unit circle and continuously enclose the wolves into the sine function (as in Fig 2B). Thus, wolves gradually approach the prey position (the global optimal solution), improving search speed and efficiency.
This paper transplanted the Cauchy mutation and Golden Sine algorithm as the information interaction mechanism between wolves into GWO algorithm, which can promote the information exchange between α, β and superior and subordinate wolves. The α, β wolf can release the decision results to subordinate wolves in the best transmission position. The improved SGWO can improve the shortcomings of the traditional GWO algorithm, and guide wolves to accelerate their approach to prey.

Adaptive position update (aGWO)
The individual position update is a key process in hunting. However, GWO always refers to the three wolf locations, making it difficult to balance global and local exploitation capability. GWO always maintains a constant update mechanism. We were inspired by the decay of the learning rate in machine learning [54], and adaptive weight ω was introduced at the location update. We define the ω in Eq (17). The updated position formula is as follows.
where a is the distance control parameter and is defined in Eq (5). represents the updated position of β by golden sine algorithm. X(t+1)' represents the next iteration position for a wolf, which is also the final position for a wolf in SGWO. Due to the traditional inertia weights being artificially set, they cannot conform to the wolves hunting process. The adaptive weight factor proposed incorporated the distance control parameter so that the algorithm will adjust the search range autonomously in different periods. In the early stage of iteration, the algorithm searched the solution space globally with a large step, and in the later stage of iteration, the algorithm searched the region finely. Setting p to 0.25 was to avoid losing the optimal solution and reducing the accuracy of the algorithm.
Form Fig 3, in the early iteration, ω is large for jumping out of the local extremes; in the late iteration, ω is smaller for improving the local search capability. Integrating adaptive weight into traditional GWO can balance global exploitation ability and local exploration ability, and find the global optimal solution quickly.
The adaptive location update mechanism is suitable for other optimization algorithms based on population, such as whale optimization algorithm (WOA) and white shark optimizer (WSO), etc. In these algorithms, this mechanism is applied to improve the formula for location update. In practice, this mechanism automatically adjusts the search step of populations by changing the parameter values, which ensures that the algorithm has global exploitation ability and local exploration ability.

Complexity analysis.
The time complexity of the comparative experimental algorithm was as follows: OðMVOÞ ¼ Tðn 2 þ n � Dim � lognÞ; OðMFOÞ ¼ Tðn 2 þ n � DimÞ; From pseudo-code, all improved strategies are included in GWO cycle optimization process. Thus, SGWO and GWO have the same time complexity. O(SGWO) = T×n×Dim. Where, T is the maximum number of iterations, n is the number of populations, and Dim is the dimension. SGWO has few parameters, the final order is:

Exploitation and exploration analysis.
In the exploitation phase, GWO completed the hunting task by reducing the value of a. a was decreased from 2 to 0 over the course of iterations. When |A|>1, the wolves deviated from its prey; When |A|<1, the wolves attacked their prey. However, this approach led to longer exploitation times and the inability to accurately locate prey. In SGWO, we introduce an information interaction mechanism, where β wolf can accurately convey the position of α wolf to its subordinate wolves at the golden section. The wolves can quickly approach their prey through the information interaction mechanism. It is worth mentioning here that the golden sine algorithm can scan all points on a unit circle and continuously surround wolves into a sine function. Therefore, the information interaction mechanism can shorten the exploitation time of wolves. At the same time, we introduce an adaptive weight ω into SGWO, which can adjust the search range independently at different stages. As the p increases, ω will decrease rapidly, allowing SGWO to globally search in the solution space in larger steps. Therefore, both the information interaction mechanism and adaptive weight can improve the exploitation ability and ensure that the algorithm quickly converges to the optimal value.
In the exploration phase, GWO is prone to stagnation in local solutions. We introduce the Circle mapping and Cauchy distribution function to solve this problem. In the initial stage, circle mapping can increase population diversity, which facilitates individuals caught in extremes to find neighbors quickly. When the algorithm stalls, α wolf will change position by the Cauchy mutation. α wolf will once again lead the pack out of the stagnant region. In addition, adaptive weight ω also takes effect during the exploration phase. At the end of the iteration, the amplitude of ω decreases as the value of λ increases. The algorithm will search more accurately within this interval. Therefore, the adaptive weight can effectively balance the exploration and exploitation stages. SGWO also emphasizes exploitation and exploration, so as to improve the convergence speed of GWO and efficiency.

Convergence analysis with Markov process and probability 1.
Previous research has indicated that the performance of metaheuristic algorithms was improved. To date, no broad study has been performed on the theoretical analysis of metaheuristic algorithms. In this case, we have introduced innovatively Markov process and probability analysis to prove convergence performance of SGWO.

Proof. (1) finite homogeneous Markov chain
Considering wolf's state shift probabilities in the reference [55], it is known that PðT φ ðφðt À 1ÞÞ ¼ φðtÞÞ is determined by l wolf state shift probabilities. State shift probabilities are PðT φ ðXðt À 1ÞÞ ¼ XðtÞÞ. According to Eq (15), PðT φ ðXðt À 1ÞÞ ¼ XðtÞÞ is related only to the state X(t−1) at the previous moment. The vector coefficients are C i . The D α , D β and D δ between the first three wolves and their prey. Thus, according to the definition of the Markov chain, {φ(t):t>0} has Markov property.
Due to search space for any optimization being finite, each x i is finite. State space X is also finite. Because φ is composed of N φ and X is a countable set, φ is finite. Similarly, the wolves' state-space set ϕ is also finite. Therefore, {φ(t):t>0} is a finite Markov chain.
According to Eq (16), it is clear that X(t) is only related to the state X(t−1) at the previous moment, not the number of iterations. Thus, {φ(t):t>0} is a finite homogeneous Markov chain.

Proof. (2) Markov process with absorbing states
During each iteration, the algorithm records the current optimal top three wolf positions, so SGWO still uses an elite retention strategy. Thus, the corresponding Markov process with absorbing states.
(2) convergence analysis with probability 1 Theorem 2. SGWO algorithm is global convergence with probability 1. Proof. To prove Theorem 2, we need to divide it into two steps. The first step is to prove that SGWO is global convergent, and then prove that the probability of convergence is 1. From the literature [56], it is clear that the conventional GWO algorithm is convergent, so that X(t+1)!X g (t) when t!1. To prove the convergence of the SGWO algorithm, it is only neces- That is, in Eq (17), ω!0 when t!1.
Therefore, SGWO is convergent. Then SGWO satisfies the necessary and sufficient condition of global convergence in reference [55].
Thus, SGWO is the globally convergent algorithm. Assume that at one time t, X(t) enters the global optimal state solution set G. Then at time t −1, is the probability measure, then: We finally prove that the SGWO algorithm is a globally convergent algorithm with a probability of 1.

Elman neural network
Elman neural network is divided into four layers: input layer, hidden layer, undertake layer, and output layer [1]. The connection of input layer, hidden layer and output layer is similar to a feedforward network. The input layer units only serve as signal transmission, while the output layer units serve as weighting. There are two types of excitation functions for hidden layer elements: linear and nonlinear. Generally, the excitation function is taken as the Sigmoid nonlinear function [2]. The receiving layer is used to remember the output value of the hidden layer unit at the previous moment, which can be considered as a delay operator with one step delay. The output of the hidden layer is used to the input of the hidden layer through the delay and storage of the undertake layer [3]. This connection method makes it sensitive to historical data. The internal feedback network improves the ability of processing dynamic information, thereby achieving dynamic modeling. The structural Elman is shown in Fig 4. yðkÞ ¼ gðo 3 The Elman model can be described as Eq (18). Where, y is the node vector of the output layer; x is the node vector of the middle layer; u is the input vector; x c is the feedback state vector; ω 1 is connection weight from hidden layer to undertake layer. ω 2 is connection weight from input layer to hidden layer. ω 3 is connection weight from hidden layer to output layer. b 1 and b 2 are the thresholds for the input layer and the hidden layer.

SGWO-Elman model
When Elman performs the prediction task, it first randomly selects the initial values of the parameters, then continuously updates the sample space through network training, and finally determines the best combination of parameters that fits the characteristics of the sample set. Due to the blind selection of initial parameters during the training process, the prediction effect of the network predictor is reduced and the training process is prone to fall into local extremes. Therefore, it is necessary to find the best parameters at the initial time. to train a better network structure. The optimal network parameters can better train the network structure in the iterative process. This can not only enhance the adaptability of the predictor to the dataset, but also improve the prediction accuracy. We introduced SGWO into the parameter optimization of the Elman neural network and proposed a new Elman prediction model (SGWO-Elman). This is another novel point about this paper.
The principle of the SGWO-Elman model was to replace the Elman network training problem with the weight optimization problem. Set the neural network structure is Net{ω 1 ,ω 2 ,ω 3 ,b 1 , b 2 }. Set X2[x 1 ,x 2 ,. . ...,x n ] andŶ 2 ½ŷ 1 ;ŷ 2 ; . . . .. ;ŷ n � are input and output prediction sample space. Set Y2[y 1 ,y 2 ,. . ...,y m ] is the sample space to be measured. Then the search optimization objective of this paper is as follows: This paper takes the parameter combination of the Elman neural network as training goal, the initial predictor was generated after Eq (19). The predictor was used as the gray wolf individual, to obtain the initial population. Then, the minimum mean square error (MSE) was used as the fitness function: SGWO continuously trained the network structure through iteration. Until the optimal parameter combination was determined. Finally, the optimal network predictor can be obtained. Elman and SGWO-Elman optimization process in space is depicted in Fig 5. In Fig 5A, Elman uses the single point search method to find the optimization route by the gradient descent, which is easy to fall into the local extremum. In Fig 5B, SGWO-Elman completes neural evolution by using the optimization algorithm, which realizes multi-point search in space. Compared with single point optimization, SGWO can find the global optimal solution. As a result of the optimization algorithm training, the network search and parameter calculation abilities have improved.
(2) evaluation criteria The initialization parameter of all algorithms was same, where the population size is 50 and maximum the number of iterations is 1000. For performance testing, 30 runs have been performed in 50 dim, 100 dim and 500 dim, respectively. And experimental results were presented in terms of: uðx i ; 10; 100; 4Þ   all functions and dimensions. Therefore, SGWO has more advantages in exploitation ability.
3. SGWO achieves the best results in all experiments in different dimensions. Compared to other algorithms, their results show a significant decrease with the increase in dimensionality. However, SGWO is not susceptible to increased dimensions. In 500 dim, SGWO still converges to the theoretical optimal value on f 1-f 4 , f 6 and f 8 . On f 5 , the results of SGWO at 500 dimensions are better than those at 50 and 100 dimensions. On f 7 , the results of SGWO are the same in the three dimensions. That proves that SGWO not only has prominent advantages in low dimensions, but also exhibits the best experimental results in 500 dimensions. Thus, SGWO is more suitable for solving high-dimensional problems and has a high dimensional extension. To better compare the convergence speed of different algorithms, the convergence curves of two unimodal functions (f 1 , f 2 ) and two multimodal functions (f 6 , f 8 ) were analyzed in Fig 7. 1. From unimodal functions, SGWO needs 300 iterations in f 1 function to converge to the theoretical optimal value, and 500 iterations in f 2 function. GWO has not reached the theoretical optimal value after 1000 iterations. The optimization results of other algorithms, including mGWO and WSO, did not change significantly. This further demonstrates that SGWO can improve global exploitation capabilities.
2. With the increase of dimension, other algorithms change obviously, while SGWO ensures better convergence speed and accuracy. Thus, SGWO has significant advantages in global exploitation ability.

PLOS ONE
Improved GWO and its application

PLOS ONE
Improved GWO and its application

Exploration analysis.
Compared to unimodal functions, multimodal functions have many local optimizations, which makes them more suitable for testing the exploration capabilities of algorithms.
1. For f 6 and f 8 , SGWO still converges to the optimal value in different dimensions. For other algorithms such as WSO and SCA, they perform poorly on these two functions. With dimensions increasing, the results of other algorithms gradually decrease. Thus, SGWO is able to provide very competitive results on f 6 and f 8 . This indicates that SGWO has a strong advantage in jumping out of the local extreme value and SGWO has better local exploration capability.
2. For f 7 , neither GWO nor SGWO undergoes significant progress than other algorithms. It shows that most meta heuristic algorithms are not applicable to the optimization on f 7 . SGWO experimental results are still slightly higher than other algorithms on f 7 . Therefore, the own defects of GWO limit the effect of SGWO. This indicates that SGWO still exhibits excellent performance than other algorithms.
3. For f 5 , the results of all algorithms are not significantly different under the same dimension. However, SGWO still has advantages. This indicates that SGWO still exhibits excellent performance in complex functions. With dimensions increase, SGWO has the best results on 500 dimensions. This indicates that SGWO still has advantages in dealing with high-dimensional problems. Fig 7, the SGWO algorithm has a faster search speed in the same dimension compared to state-of-the-art WSO and mGWO. SGWO curve has fewer turning points, while other algorithms fall into local extreme points many times. Because the SGWO algorithm incorporated a hybrid strategy optimization leadership mechanism, the head wolf was prevented from falling in the local extremum through random disturbance. Therefore, SGWO has local exploration capability.

From
5. From multimodal functions, the convergence effect of SGWO, mGWO and WOA functions is obviously faster than other algorithms. At the end of iteration, the optimization results of other algorithms are not affected by the increase in iteration times. SGWO has an outstanding advantage over single-peaked functions, but the optimization performance and convergence speed still need to be improved.

Non-parametrical statistical tests.
A full statistical analysis of the optimizer comparison must be presented based on significant non-parametric tests. As the non-parametric test, Friedman test [58] was used to examine the overall performance of all algorithms. The null assumption in this test was that all algorithms would perform equally. The alternative hypothesis consists in the difference between more algorithms. We used Friedman test to analyze the results of Tables 2-4. Table 5 shows the results of the Friedman test.
From Table 5, the p-values for all 3 dimensions are smaller than 0.05. Therefore, the null hypothesis is rejected. This indicates that all algorithms are significantly different. In this case, we will use the "Nemenyi post-hoc test" [59] for adjusting the results for pairwise comparisons. The Nemenyi test requires to calculate the critical value.
CD ¼ q a ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi kðk þ 1Þ 6N where, k represents the number of algorithms. N represents the number of functions. After calculation, CD = 4.0429. To calculate the statistic, we rank the algorithm performance for each problem and compute the mean of each algorithm. Table 6 shows the results of mean ranks. Table 7 shows the mean ranks difference between each algorithm and SGWO. If the difference between the mean ranks exceeds CD, the hypothesis that the two algorithms have the same performance is rejected. From Table 7, SCA, MFO and CMAES are higher than CD at all dimensions. This indicates that SCA, MFO and CMAES all differed significantly from SGWO. There is no significant difference between other algorithms and SGWO.

Wilcoxon test and ranking.
The Friedman + Nemenyi test can express the overall performance and individual differences of SGWO. However, it is still necessary to evaluate the comparative results of each algorithm on different functions. We used the Wilcoxon rank sum test [60]. Table 8 shows the p values of SGWO and other algorithms, which are at p = 0.05 significance level.
It can be seen from Table 8 that SGWO is more statistically significant than all other algorithms except mGWO and WOA. In 50 and 100 dim, the results between mGWO and WOA are not applicable on f 6 and f 8 . These prove that their significance with SGWO is lower. In 500 dim, the results of WOA and GWO on f 6 and f 8 are higher than 50 and 100 dimensions, indicating that there is a significant difference between SGWO and these two algorithms. Therefore, the SGWO algorithm is not affected by dimensions and can be extended to high dimensions. SCA, MFO, WSO and CMAES have the same result on different functions. This indicates SGWO is significantly different from SCA, MFO, WSO. However, CMAES differs less from SGWO in the overall comparison Table 7. All the above analyses are consistent with the results in Table 7. Meanwhile, all results are the same on f 1-f 4 , but the result of f 5 is higher than other functions. This shows that each algorithm has a lower optimization effect on f 5 . With the increase of dimensions, the results have little difference in different dimensions.
In conclusion, SGWO is superior to other comparison algorithms. SGWO has significantly better optimization performance and comprehensive strength. In addition, we used MAE to   sort the all algorithms [61]. MAE expression is as follows: where, Mean i is the mean value of the algorithm. o i is the theoretical optimal value of the benchmark function. N f is the number of benchmark functions. Table 9 shows MAE under different dimensions. Table 10 shows the sum of MAE in each algorithm. MAE = (MAE 50 +-MAE 100 +MAE 500 )/3. From Table 9, all algorithms rank differently in each dim. With the increase of dim, the MAE values change significantly for other algorithms, but SGWO can maintain the optimal level. This indicates that SGWO has strong stability and is not easily affected by dim changes. In 50 and 100 dimensions, the ranking of the eight algorithms is the same, SGWO > mGWO > GWO > CMAES > WSO > SCA > MFO >WOA respectively. CMAES ranks higher than GWO and mGWO in the 500 dimensions, which shows that the performance effect of each algorithm is different in the three dimensions. GWO is lower than SGWO in every three dimensions. This proves that the comprehensive performance of GWO is better than other comparison algorithms, and the improved strategy proposed in this paper significantly improves the optimization effect of GWO. The sorting results in Table 10   From Fig 8, The fitness values of SGWO are lower than other comparison algorithms, even close to zero. It shows that the improved strategy based on an adaptive information interaction mechanism is effective for traditional GWO. The median of SGWO is lower than other algorithms whether in different dim or peaks. This shows that SGWO can get a better optimization effect after multiple iterations. At the same time, the interquartile spacing of SGWO is short than other algorithms, which indicates that the optimization effect of SGWO is more concentrated under each function and dimension.

Experimental information.
In order to analyze the impact of different strategies on the SGWO algorithm, we conducted comparative experiments on four algorithms. cGWO is the first strategy "Circle population initialization"; iGWO is the second strategy "Information interaction mechanism"; aGWO is the third strategy "Adaptive position update"; aWOA is the application of the third strategy to WOA.
To ensure the experimental objective fairness, The initialization parameter of all algorithms was same, where the population size is 50 and maximum the number of iterations is 1000. For performance testing, 30 runs have been performed in 50 dim, 100 dim and 500 dim, respectively. And experimental results are presented in terms of: • Best of 30 runs • Standard deviation of 30 runs.  Table 11 shows the results for GWO, cGWO, iGWO, aGWO, aWOA and SGWO in 50 dim, 100 dim, and 500 dim, respectively. Fig 9 shows the convergence curves of different algorithms in unimodal functions f 2 and multimodal functions f 7 .

cGWO analysis.
From Table 11, compared to GWO, the results of cGWO have improved slightly in all functions of different dimensions. This indicates that the circle population initialization strategy can improve the optimization ability of GWO. However, the improvement effect of cGWO is Table 11. Experimental results of three strategies in 50, 100 and 500 dims. No.
Index weaker than iGWO, aGWO and SGWO. Specifically, circle population initialization was used only once in the initialization step, which weakened the effect of cGWO. From Fig 9, in unimodal functions f 2 , although the convergence speed of cGWO algorithm is slightly higher than GWO, it is still not as good as other strategies. In multimodal functions f 7 , the convergence speed of cGWO is better than GWO and iGWO. At the same time, the number of transitions in cGWO should be less than aGWO, aWOA and SGWO. This indicates that circle population initialization can help cGWO jump out of local extremum. Therefore, cGWO can not only improve the exploration ability of GWO, but also contribute to improving SGWO. Table 11, compared to GWO, the results of iGWO have improved significantly in all functions of different dimensions. In unimodal functions f 1-f 4 , iGWO can be improved dozens of times. In f 6 and f 8 , iGWO can reach the theoretical optimal value. This indicates that the information interaction mechanism can improve the convergence ability.

iGWO analysis. From
From Fig 9, in unimodal functions f 2 , the convergence speed of iGWO is higher than GWO and cGWO. In multimodal functions f 7 , the convergence speed of iGWO is lower than GWO, cGWO and aWOA at the beginning of the iteration. However, the convergence speed of iGWO is higher than GWO, cGWO and aWOA at the end of the iteration. Therefore, the information interaction mechanism will contribute to generally the efficiency of SGWO. Table 11, aGWO can reach the theoretical optimal value in f 1-f 4 , f 6 and f 8 . The results of aGWO are not significantly different from SGWO. This indicates that adaptive position update strategy can improve optimization performance of GWO and play an important role in SGWO. Meanwhile, information interaction mechanism is the best strategy compared to the other two strategies. From Fig 9, the convergence speed of aGWO is the same as SGWO. And they can quickly converge to the optimal value. On the meanwhile, we incorporate adaptive position update strategy into WOA. Although the convergence performance of aWOA is not as good as that of aGWO, it is still superior to GWO. From Table 10, it can be seen that GWO performs better than WOA. Therefore, aGWO > aWOA > GWO > WOA. The information interaction mechanism can also improve the optimization performance of WOA. This further proves that the information interaction mechanism is an effective strategy.

Sensitivity analysis of parameters
The sensitivity analysis of two control parameters of Eq (17) is investigated in this section. These two parameters are λ and p, which together control the change of ω in the iteration. On the meanwhile, ω plays an important role in balancing exploration and exploitation. Therefore, it is necessary to conduct sensitivity analysis on λ and p. Table 12 represents ω mean by 1000 iterations under various parameter combinations. As shown in Table 12, when p is constant, the mean value of ω gradually decreases as λ increases. When λ is constant, as p increases, the mean value of ω gradually decreases and decays faster. In the seventh experiment, when λ and p reached the maximum, the mean value of ω was the minimum. The results can be interpreted as saying that λ and p are negatively correlated with ω. Fig 10 represents ω curves of 1000 iterations under various parameter combinations. When λ is constant, with the increase of p, the value of ω decreases rapidly in the early stage of the iteration. That proves that p can exploitation time and quickly find the optimal value range for SGWO. At the end of the iteration, as the value of λ increases, ω will quickly transition to the exploration phase. With the increase of the iterations, the amplitude of the ω is decreased, which proves that SGWO will refine the search solution. Therefore, we set λ to the maximum to improve the SGWO's exploration performance. Although increasing the p will accelerate the decrease in ω, considering that SGWO needs to balance exploration and exploitation, we set p to 0.25.

SGWO for practical applications
6.4.1. SGWO for tension/compression spring design problem. The objective of this problem is to minimize the weight of a tension/compression spring [62]. This problem can be abstracted into the following mathematical model. In the model, x 1 is wire diameter, x 2 is mean coil diameter, and x 3 is the number of active coils. Table 13 shows the comparison of results of the tension/compression spring design problem. Table 13 suggests that SGWO finds a design with the minimum weight for this problem. This further proves that SGWO can be applied to practical problems and exhibits better performance.

SGWO for a large-scale optimization problem.
To prove the scalability of SGWO in large-scale optimization problems [63], we conducted a comparative experiment under 1000 dimensions. The experimental information is the same as that in section 6.1.1. Table 14 shows the results of 8 algorithms in f 1 , f 2 , f 6 and f 8 . From Table 14, SGWO can still find theoretical optimal values in large-scale optimization problems. Compared to other algorithms, SCA and MFO failed on f 2 and the results of WSO are also very poor on four functions. Therefore, SGWO is suitable for solving large-scale optimization problems and has strong stability.

Experimental information. (1) datasets information
To verify the performance of SGWO-Elman, we selected six benchmark datasets from the UCI (http://archive.ics.uci.edu/ml) database and did two groups of experiments. Because there are a few null values and characteristic indexes irrelevant to the study, the collected datasets were preprocessed. The processed data information was shown in Table 15. To eliminate the problem of dimensional inconsistency, normalization was carried out before the data was input into the prediction model. Table 16 shows the number of hidden layers for different datasets.
(2) evaluation criteria For performance testing, 10 runs have been performed in three comparative experiments. And experimental results are evaluated in terms of:  (20). The comparison methods of three experiments are as follows. (

3) comparison methods
For the first comparative experiment: we selected the SCA, MFO, sparrow search optimization algorithm (SSA) [35] and atom search optimization algorithm (ASO) [31] algorithms. They were fused into the Elman neural network to form SSA-Elman, MFO-Elman, ASO-Elman and SCA-Elman. These four optimization algorithms will be compared with SGWO-Elman. The parameters of all optimization algorithms were set to the same value.
For the second comparative experiment: we selected the traditional Elman neural network, standard back propagation neural network (BP), radial basis function neural network (RBF) [70], and generalized regression neural network (GRNN) [71]. The prediction effect of SGWO-Elman was determined by Elman. These four neural networks will be compared with SGWO-Elman.
For the third comparative experiment: we selected long short-term memory neural network (LSTM) [72] and RBF. They were fused into SGWO form SGWO-LSTM and SGWO-RBF. These two neural networks will be compared with SGWO-Elman. The parameters of all neural networks were set to the same value.

Comparison experiments based on optimization strategy.
Under the influence of SGWO performance, SGWO-Elman has better parameter optimization ability. To fairly analyze the optimization effect of SGWO on neural networks, Table 17 shows the comparison results of SGWO-Elman, SSA-Elman, MFO-Elman, ASO-Elman and SCA-Elman on six datasets. In Table 17, MSE metric can evaluate the predictive performance of neural networks by comparing prediction errors. MSE metric is the minimum mean square error, which was defined in Eq (20). Table 18 shows the prediction rankings of each algorithm on six datasets.

PLOS ONE
Improved GWO and its application (1) prediction performance analysis From Table 17, all results of SGWO-Elman are optimal except the std of D 4 and D 6 , and significantly lower than other algorithms. This indicates that SGWO can reduce the Elman's prediction error and improve the Elman prediction accuracy. Compared with other evolutionary strategies, SGWO algorithm based on an adaptive information interaction mechanism is an effective parameter optimization method. On D 1 , D 5 and D 6 datasets, SSA-Elman, MFO-Elman, ASO-Elman, and SCA-Elman have large errors. Through data analysis, it can be seen that D 1 has a large amount of data and many data features, and the data features of D 5 and D 6 have weak correlations. Therefore, it is more complex to predict the three kinds of datasets. However, SGWO-Elman has a lower error on these three datasets, which indicates that SGWO-Elman is suitable for weakly correlated datasets and can show better prediction ability, stronger stability and higher robustness than other algorithms. From Table 18, SGWO-Elman always ranks first on all datasets in prediction performance. SGWO-Elman > SCA-Elman > MFO-Elman > ASO-Elman > SSA-Elman. Therefore, for the prediction problem, SGWO has accurate prediction performance. And for the parameter optimization problem, SGWO has a better optimization effect.
From Fig 11, the error of SGWO-Elman is lower than other algorithms on all datasets. On the D 1 , the errors of SGWO-Elman, SCA-Elman are close to zero, but ASO-Elman and MFO-Elman are very high, followed by SSA-Elman. On the D2-D 4 datasets, the overall prediction error is low. Due to the limitations of D 5 and D 6 , the MSE value of each algorithm is higher than other datasets. But the error distribution of SGWO-Elman is concentrated in D 5 . These show that SGWO-Elman has higher prediction performance and prediction accuracy and is suitable for most data. In practical engineering problems, using SGWO-Elman to predict can bring the greatest economic benefits to the project. The result of the statistical analyses is presented on boxplots in Fig 12. From Fig 12, Compare with other algorithms SGWO-Elman has a lower median, and its lower quartile is close to the upper quartile in 6 kinds of datasets. There are almost no outliers in SGWO-Elman. Other algorithms have more outliers on D 1 , D 2 , D 4 and D 5 . The results show that SGWO-Elman has higher prediction performance and stability than other algorithms. This fully verifies the good applicability of SGWO in Elman parameter optimization.
(3) training time analysis To verify the running speed of SGWO-Elman, we tested five algorithms on six datasets. Table 19 records the average training time of each algorithm in 10 tests, and Fig 13 displays the histogram of Table 19.
From Table 19 and Fig 13, it can be seen that ASO-Elman outperforms other algorithms in the average training on six datasets. SSA-Elman has the longest average training time. The  average runtime of SGWO-Elman is not significantly different between SCA-Elman and MFO-Elman. Overall, the average training time of SGWO-Elman is at a medium level.

Comparison experiments based on neural network.
The prediction effect of SGWO-Elman was determined by Elman. To fairly analyze the prediction advantages of SGWO-Elman on various neural networks, it was compared with the traditional Elman neural network, standard BP neural network, the radial basis function neural network neural network (RBF) [70], and generalized regression neural network (GRNN) [71]. The experimental results are shown in Table 20.
Comparing the prediction results of Elman with BP, RBF, and GRNN in Table 20. On the mean, Elman only has advantages in D 3 and D 4 , it is inferior to BP and RBF in D 2 and D 5 respectively, and has a large error in D 1 and D 6 . This indicates that the overall prediction performance of Elman needs to be improved. Std results demonstrate that Elman reaches the lowest error in the dataset more frequently than other algorithms, proving that Elman has a better stability. Elman and BP have the lowest error on three datasets in the min respectively. Elman has the lowest error in only three datasets in the max. Those indicate that Elman is prone to a large bias in predicting a certain sample point, and the prediction effect and robustness of Elman still need to be improved. According to the overall analysis, the comprehensive performance of Elman is slightly better than BP, RBF, and GRNN.
Comparing the prediction results of SGWO-Elman with other algorithms in Table 20. In terms of mean and max, SGWO-Elman maintains the lowest error in all datasets and ranks first. Std results show that SGWO-Elman has the lowest error in four datasets, indicating that SGWO significantly improves the prediction ability and robustness of Elman. In min, SGWO Elman performs best in D 2 and D 6 , it ranks first in D 1 , second in D 3 and D 5 , and third in D 4 , and values of SGWO Elman in D 5 are far better than other algorithms by several orders of magnitude. Those indicate that SGWO-Elman can also produce better prediction ability for less relevant data sets.
Comprehensive analysis shows that SGWO-Elman has higher accuracy than other algorithms in general, which obviously improves the stability and predictive ability of Elman, making Elman demonstrate stronger memory function in neural networks. The neural evolution method based on SGWO is effective, and the neural network based on SGWO-Elman has higher prediction accuracy. SGWO-Elman plays a greater role in solving practical engineering problems with high complexity, ensuring the minimum misjudgment rate as far as possible to reduce the economic loss of engineering production. Figs 14 and 15 show MSE results and box diagrams respectively. From Fig 14, SGWO-Elman has a lower prediction error than other neural networks on all datasets. On D 1 , D 5 and D 6 , although the error of other algorithms is very large, SGWO-Elman is still close to zero. This shows that neither evolutionary strategy nor the neural network applies to these datasets, but SGWO-Elman shows better prediction performance. On the D2-D 4 datasets, SGWO-Elman still maintains better prediction accuracy. The overall analysis shows that the new neural network evolution strategy proposed in this paper can improve the shortcomings of traditional neural networks in parameter optimization. Elman based on SGWO is obviously superior to other neural networks and shows excellent prediction ability on most datasets. From Fig 15, the prediction errors of SGWO-Elman are lower than other neural networks, which indicates that the parameter optimization of the Elman neural network based on SGWO is effective. Compared with other algorithms, SGWO-Elman has no outliers on all datasets. In addition, SGWO-Elman centers the box graph on six datasets, which shows that parameters after multiple iterations can obtain a stable prediction effect relatively. 6.5.4. SGWO for other neural networks. SGWO can extend to other types of neural networks, such as Long Short-Term Memory neural network (LSTM) and RBF. We incorporated SGWO into LSTM and RBF. The implementation steps for SGWO-LSTM and SGWO-RBF are shown in Fig 16. To verify the advantages of SGWO-Elman in prediction and optimization capabilities, we compared SGWO-Elman, SGWO-LSTM and SGWO-RBF. Table 21 shows the experimental errors of the three algorithms on six datasets. From Table 21, it can be seen that the prediction error mean of SGWO-Elman on the six datasets is lower than SGWO-LSTM and SGWO-RBF. SGWO-LSTM has better prediction performance than SGWO-RBF. This not only indicates that SGWO as an optimization algorithm can significantly improve Elman's prediction performance, but also SGWO-Elman's prediction performance is higher than other neural networks. Meanwhile, SGWO-Elman has the lowest std value on the four datasets, which proves that SGWO-Elman has predictive stability.

Conclusion
In this study, the improved grey wolf optimizer was proposed and applied to the parameter optimization of the Elman neural network as an evolutionary strategy. Through theoretical analysis and numerical experiments, the optimization-seeking performance and prediction performance of the model was explored, and the following conclusions were obtained: 1. SGWO with an adaptive information interaction mechanism was proposed. This method used circle mapping to initialize the population, strengthened the information exchange among wolves in the channel through the Cauchy variant and the Golden-Sine algorithm, and updated the position of wolves with adaptive distance control weight.
2. Theoretical analysis proved that the global convergence probability of SGWO was 1, and that the experimental process of SGWO was a finite homogeneous Markov chain with absorbing states. Numerical experiments with 8 benchmark functions showed that SGWO can effectively improve convergence accuracy and optimization efficiency than other 6 algorithms.
3. The prediction performance of SGWO-Elman model was explored through comparative experiments. The results showed that SGWO-Elman model has good prediction accuracy,